﻿\subsection{Loading a constant into a register}
\label{ARM_big_constants}

\subsubsection{32-bit ARM}
\label{ARM_big_constants_loading}

As we already know, all instructions have a length of 4 bytes in ARM mode and 2 bytes in Thumb mode.

Then how can we load a 32-bit value into a register, if it's not possible to encode it in one instruction?

Let's try:

\begin{lstlisting}[style=customc]
unsigned int f()
{
	return 0x12345678;
};
\end{lstlisting}

\begin{lstlisting}[caption=GCC 4.6.3 -O3 \ARMMode,style=customasmARM]
f:
        ldr     r0, .L2
        bx      lr
.L2:
        .word   305419896 ; 0x12345678
\end{lstlisting}

So, the \TT{0x12345678} value is just stored aside in memory and loaded if needed.

But it's possible to get rid of the additional memory access.

\begin{lstlisting}[caption=GCC 4.6.3 -O3 -march{=}armv7-a (\ARMMode),style=customasmARM]
movw    r0, #22136      ; 0x5678
movt    r0, #4660       ; 0x1234
bx      lr
\end{lstlisting}

We see that the value is loaded into the register by parts, the lower part first (using \INS{MOVW}), 
then the higher (using \INS{MOVT}).

This implies that 2 instructions are necessary in ARM mode for loading a 32-bit value into a register.

It's not a real problem, because in fact there are not many constants in real code (except of 0 and 1).

Does it mean that the two-instruction version is slower than one-instruction version?

Doubtfully. Most likely, modern ARM processors are able to detect such sequences and execute
them fast.

On the other hand, \IDA is able to detect such patterns in the code and disassembles this function as:

\begin{lstlisting}[style=customasmARM]
MOV    R0, 0x12345678
BX     LR
\end{lstlisting}

\subsubsection{ARM64}

\begin{lstlisting}[style=customc]
uint64_t f()
{
	return 0x12345678ABCDEF01;
};
\end{lstlisting}

\begin{lstlisting}[caption=GCC 4.9.1 -O3,style=customasmARM]
mov	x0, 61185   ; 0xef01
movk	x0, 0xabcd, lsl 16
movk	x0, 0x5678, lsl 32
movk	x0, 0x1234, lsl 48
ret
\end{lstlisting}

\myindex{ARM!\Instructions!MOVK}
\TT{MOVK} stands for \q{MOV Keep}, i.e., it writes a 16-bit value into the register, not touching the rest of the bits.
\myindex{ARM!Optional operators!LSL}
The \TT{LSL} suffix shifts left the value by 16, 32 and 48 bits at each step. The shifting is done before loading.

This implies that 4 instructions are necessary to load a 64-bit value into a register.

\myparagraph{Storing floating-point number into register}

It's possible to store a floating-point number into a D-register using only one instruction.

For example:

\begin{lstlisting}[style=customc]
double a()
{
	return 1.5;
};
\end{lstlisting}

\begin{lstlisting}[caption=GCC 4.9.1 -O3 + objdump,style=customasmARM]
0000000000000000 <a>:
   0:   1e6f1000        fmov    d0, #1.500000000000000000e+000
   4:   d65f03c0        ret
\end{lstlisting}

The number $1.5$ was indeed encoded in a 32-bit instruction.
But how?
\myindex{ARM!\Instructions!FMOV}

In ARM64, there are 8 bits in the \TT{FMOV} instruction for encoding some floating-point numbers.

The algorithm is called \TT{VFPExpandImm()} in \ARMSixFourRefURL.
\myindex{minifloat}
This is also called \IT{minifloat}\footnote{\href{http://go.yurichev.com/17139}{wikipedia}}.

We can try different values: the compiler is able to encode $30.0$ and $31.0$, but it couldn't encode $32.0$,
as 8 bytes have to be allocated for this number in the IEEE 754 format:

\begin{lstlisting}[style=customc]
double a()
{
	return 32;
};
\end{lstlisting}

\begin{lstlisting}[caption=GCC 4.9.1 -O3,style=customasmARM]
a:
	ldr	d0, .LC0
	ret
.LC0:
	.word	0
	.word	1077936128
\end{lstlisting}
